Continuously Differentiable Exponential Linear Units
نویسنده
چکیده
Exponential Linear Units (ELUs) are a useful rectifier for constructing deep learning architectures, as they may speed up and otherwise improve learning by virtue of not have vanishing gradients and by having mean activations near zero [1]. However, the ELU activation as parametrized in [1] is not continuously differentiable with respect to its input when the shape parameter α is not equal to 1. We present an alternative parametrization which is C continuous for all values of α, making the rectifier easier to reason about and making α easier to tune. This alternative parametrization has several other useful properties that the original parametrization of ELU does not: 1) its derivative with respect to x is bounded, 2) it contains both the linear transfer function and ReLU as special cases, and 3) it is scale-similar with respect to α. The Exponential Linear Unit as described in [1] is as follows: ELU(x, α) = { x if x ≥ 0 α(exp(x)− 1) otherwise (1) Where x is the input to the function, and α is a shape parameter. The derivative of this function with respect to x is: d dx ELU(x, α) = { 1 if x ≥ 0 α exp(x) otherwise (2) In Figures 1a and 1b we plot this activation and its derivative with respect to x for different values of α. We see that when α 6= 1, the activation’s derivative is discontinuous at x = 0. Additionally we see that large values of α can cause a large (“exploding”) gradient for small negative values of x, which may make training difficult. Our alternative parametrization of the ELU, which we dub “CELU”, is simply the ELU where the activation for negative values has been modified to ensure that the derivative at x = 0 for all values of α is 1: CELU(x, α) = { x if x ≥ 0 α ( exp ( x α ) − 1 ) otherwise (3) Note that ELU and CELU are identical when α = 1: ∀x ELU(x, 1) = CELU(x, 1) (4) The derivative of the activation with respect to x and α are as follows: d dx CELU(x, α) = { 1 if x ≥ 0 exp ( x α ) otherwise (5) d dα CELU(x, α) = { 0 if x ≥ 0 exp ( x α ) ( 1− x α ) − 1 otherwise Like in ELU, derivatives for CELU can be computed efficiently by precomputing exp ( x α ) and using it for the activation and its derivatives. Unlike ELU, CELU is scale-similar as a function of x and α: CELU(x, α) = 1 c CELU(cx, cα) (6) The CELU also converges to ReLU as α approaches 0 from the right and converges to a linear “no-op” activation as α approaches∞: lim α→0+ CELU(x, α) = max(0, x) (7) limα→∞CELU(x, α) = x(8) This gives the CELU a nice interpretation as a way to in-terpolate between a ReLU and a linear function using α.Naturally, CELU can be slightly shifted in x and y such thatit converges to any arbitrary shifted ReLU, in case negativeactivations are desirable even for small values of α.References[1] D. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accu-rate deep network learning by exponential linear units (elus).CoRR, abs/1511.07289, 2015. 1arXiv:1704.07483v1[cs.LG]24Apr2017
منابع مشابه
A Z/-integral Representation for the Continuous Linear Operators on Spaces of Continuously Differentiable Vector-valued Functions
Suppose X and Y are linear normed spaces, and Ci is the space of continuously differentiable functions from [0, 1 ] into X. The authors give a represention theorem for the linear operators from Ci into Y in terms of the n-integral operating on the function as opposed to the derivative of the function.
متن کاملA few remarks concerning complex-analytic metric spaces
Let E be a closed set in C, normally with empty interior, and let us consider continuously-differentiable functions on E in the sense of Whitney, in which the differential of the function is automatically included. This means a continuous complex-valued function f(z) on E together with a real-linear mapping dfz : C n → C defined and continuous for z ∈ E, with properties like those of an ordinar...
متن کاملA Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential
It is known that the solution to a Cauchy problem of linear differential equations: x′(t) = A(t)x(t), with x(t0) = x0, can be presented by the matrix exponential as exp( ∫ t t0 A(s) ds)x0, if the commutativity condition for the coefficient matrix A(t) holds: [ ∫ t t0 A(s) ds, A(t) ] = 0. A natural question is whether this is true without the commutativity condition. To give a definite answer to...
متن کاملA Rough Differentiable Function
A real-valued continuously differentiable function f on the unit interval is constructed such that ∞ ∑ k=1 βf (x, 2 −k) = ∞ holds for every x ∈ [0, 1]. Here βf (x, 2−k) measures the distance of f to the best approximating linear function at scale 2−k around x.
متن کاملTHE RIEMANN MAPPING THEOREM FOR o-MINIMAL FUNCTIONS
The proof of the Riemann mapping theorem is not constructive. We study versions of it for sets and functions which are definable in an ominimal expansion of the real field. The diffeomorphisms between the subsets and the unit-ball can be chosen definable if we only request them to be continuously differentiable. For many structures expanding the real exponential field we can choose them smooth....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.07483 شماره
صفحات -
تاریخ انتشار 2017